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Abstract — The problem of joint universal source coding and 
modeling, addressed by Rissanen in the context of lossless 
codes, is generalized to fixed-rate lossy coding of continuous- 
alphabet memoryless sources. We show that, for bounded distor- 
tion measures, any compactly parametrized family of i.i.d. real 
vector sources with absolutely continuous marginals (satisfying 
appropriate smoothness and Vapnik-Chervonenkis learnability 
conditions) admits a joint scheme for universal lossy block coding 
and parameter estimation, and give nonasymptotic estimates of 
convergence rates for distortion redundancies and variational 
distances between the active source and the estimated source. We 
also present explicit examples of parametric sources admitting 
such joint universal compression and modeling schemes. 

I. Introduction 

In universal data compression, a single code achieves 
asymptotically optimal performance on all sources within a 
given family. Intuition suggests that a good universal coder 
should acquire an accurate model of the source statistics 
from a sufficiently long data sequence and incorporate this 
knowledge in its operation. For lossless codes, this intuition 
has been made rigorous by Rissanen [1]. Under his scheme, 
the data are encoded in a two-stage set-up, in which the binary 
representation of each source block consists of two parts: (1) a 
suitably quantized maximum-likelihood estimate of the source 
parameters, and (2) lossless encoding of the data matched 
to the acquired model; the redundancy of the resulting code 
converges to zero as logn/n, where n is the block length. 

In this paper, we extend Rissanen's idea to lossy block 
coding (vector quantization) of i.i.d. sources with values in 
JR d for some finite d. Specifically, let {X i }°l_ oc be an i.i.d. 
source with the marginal distribution of X\ belonging to 
some indexed class {Pg : 9 £ 6} of absolutely continuous 
distributions on IR^, where 9 is a bounded subset of H fc for 
some k. For bounded distortion measures, our main result, 
Theorem ^ states that if the class {Pe} satisfies certain 
smoothness and learnability conditions, then there exists a 
sequence of finite-memory lossy block codes that achieves 
asymptotically optimal compression of each source in the 
class and permits asymptotically exact identification of the 
active source with respect to the variational distance, defined 
as dv{P,Q) = sup B \P(B) — Q(B)\, where the supremum 
is over all Borel subsets of TR d . The overhead rate and 
the distortion redundancy of the scheme converge to zero 
as 0(logn/n) and 0(^logn/n), respectively, where n is 
the block length, while the active source can be identified 



up to a variational ball of radius 0(^J\ogn/n) eventually 
almost surely. We also describe an extension of our scheme 
to unbounded distortion measures satisfying a certain moment 
condition, and present two examples of parametric families 
satisfying the regularity conditions of Theorem ^ 

While most existing schemes for universal lossy coding rely 
on implicit identification of the active source (e.g., through 
topological covering arguments [2], Glivenko-Cantelli uni- 
form laws of large numbers [3], or nearest-neighbor code 
clustering [4]), our code builds an explicit model of the 
mechanism responsible for generating the data and then selects 
an appropriate code for the data on the basis of the model. 
This ability to simultaneously model and compress the data 
may prove useful in such applications as media forensics [5], 
where the parameter 9 could represent evidence of tampering, 
and the aim is to compress the data in such a way that the 
evidence can be later extracted with high fidelity from the 
compressed version. Another key feature of our approach is 
the use of Vapnik-Chervonenkis theory [6] in order to connect 
universal encodability of a class of sources to the combina- 
torial "richness" of a certain collection of decision regions 
associated with the sources. In a way, Vapnik-Chervonenkis 
estimates can be thought of as an (imperfect) analogue of the 
combinatorial method of types for finite alphabets [7]. 

II. Preliminaries 

Let {Xi}°l_ oa be an i.i.d. source with alphabet X, such that 
the marginal distribution of Xi comes from an indexed class 
{P e : 9 e 6}. For any f e Z and any m, n > 0, let X™ (i) 
denote the segment (X tn - m +i, X tn ^ m+2 , • • • , X tn ) of {Xi}, 
with X{!;(t) understood to denote an empty string for all n,t. 
We shall abbreviate X%(t) to X n (t). 

Consider coding {Xi} into the reproduction process {Xi} 
with alphabet X by means of a stationary lossy code with 
block length n and memory length m [an (n, m)-block code, 
for brevity]. Such a code consists of an encoder / : X n x 
X m — > S and a decoder tf> : S — > X n , where S is a collection 
of fixed-length binary strings: X n {t) = 4>(f '{X n (t) , - 
1))), Vt € 2. That is, the encoding is done in blocks of length 
n, but the encoder is also allowed to observe a fixed finite 
amount of past data. Abusing notation, we shall denote by 
(jn.m ^ otn ^ com p OS ition <fio f and the encoder-decoder pair 
(/, (j>); when m — 0, we shall use a more compact notation C n . 
The number R(C n ' m ) = rT 1 log \S\ is the rate of C*"< m in bits 



per letter. The distortion between the source n-block X n = 
(Xl, • • • , X n ) and its reproduction X n — (Xi, ■ ■ ■ , X n ) is 
given by p(X n , X n ) = J^ti where p : X x X -> 

K + is a single-letter distortion measure. 

Suppose X\ ~ Pg for some 6* 6 0. Since the source is 
i.i.d., and the code C n ' m does not vary with time, the process 
{(Xi, Xi)} is n-stationary, and the average distortion of C n,m 
is D B {C n < m ) = n _1 E e [p(X n (l),X n (l))], where X n (l) = 
<fi(f (X n (1) , X^(0))). The optimal performance achievable on 
Pg by any finite-memory n-block code at rate R is given by 
the nth-order operational distortion-rate function (DRF) 

D"'*(R) = inf inf D e (C n ' m ), 

where the asterisk denotes the fact that we allow any finite 
memory length. For zero-memory n-block codes the corre- 
sponding DRF is 

D n d (R) ± inf D 6 (C n ). 

C n :R(C")<R 

Clearly, Dg'*(R) < Dg(R). Conversely, we can use memory- 
less minimum-distortion encoders to convert any (n, m)-block 
code into a zero-memory n-block code without increasing 
either distortion or rate, so D'' g l (R) = Dg'*(R). Finally, 
the best performance achievable by any block code, with or 
without memory, is given by the operational DRF Dg(R) = 
inf D2(R) = lim D2(R); since the source is i.i.d., Dg(R) 

n>X n—*oo 

is equal to the Shannon DRF Dg(R) by the source coding 
theorem and its converse. 

We are interested in sequences of codes that asymptotically 
achieve optimal performance across the entire class {Pe}. Let 
{(7 n,m }^_ 1 be a sequence of (n,rn)-block codes, where the 
memory length m may depend on n, such that R(C n ' m ) — > 
R. Then {C n ' m } is weakly minimax universal for {Pe} if 
the distortion redundancy 5g(C n ^ m ) = D e (C n > m ) - Dg(R) 
converges to zero as n — * oo for every 9 E <d. 1 We shall 
follow Chou et al. [4] and split Sg(C n,m ) into two terms: 

5 e (C n > m ) = (D e (C n > m ) - D%(R)) + (D n e (R) - D 9 {R)). 

The first term, which we shall call the nth-order redundancy 
and denote by 5g(C n ' m ), is the excess distortion of C n,m 
relative to the best n-block code for Pg, while the second term 
gives the extent to which the best n-block code falls short of 
the Shannon optimum. Note that 5g(C n,m ) — > if and only if 
5%(C n - m ) -> 0, since D$(R) -> Dg(R) by the source coding 
theorem. 

III. Informal description of the system 

As stated in the Introduction, we are after a sequence of 
lossy block codes that would not only be universally optimal 
for a given class {Pg} of i.i.d. sources with values in IR'', 
but would also permit asymptotically reliable identification of 
the source parameter 9 € 0. We formally state and prove our 
result in Section IIVI here we outline the main idea behind it. 

'See [2] for other notions of universality for source codes. 



Fix the block length n, and denote by X n the current n- 
block X n (t) and by Z n the preceding n-block X n (t-1). Let 
us assume that we can find for each 9 E an n-block code 
Cg at the desired rate R which achieves the nth-order DRF 
for Pg: Dg(Cg) = D 7 g l (R). The basic idea is to construct an 
(n, n)-block code C n ' n that first estimates the parameter 9 of 
the active source from Z n and then codes X n with the code 
Cg , where 9(Z n ) is the estimate of 9, suitably quantized. 

Suppose the encoder can use Z n to identify the active 
source up to a variational ball of radius (9(y / logn/n). Next, 
suppose that the parameters of the estimated source (assumed 
to belong to a bounded subset of Wi k for some fc) are quantized 
to O(logn) bits in such a way that the variational distance 
between any two sources whose parameters lie in the same 
quantizer cell is 0(y/l/n). If 6 — 9(Z n ) is the quantized 
parameter estimate, then the variational distance between Pg 
and the "true" source Pg is 0(y/\ogn/n), which for bounded 
distortion functions implies an 0(\J\ogn/n) upper bound on 
the distortion redundancy Sg(C~). 

More formally, let / : X n — > S denote the map 
that sends each Z n to the binary representation of 9(Z n ), 
where S is a collection of fixed-length binary strings with 
log|«S| = O(logn), and let ip : S — * be the parameter 
decoder that maps each s 6 S to its reproduction: 9(Z n ) = 
ip(f(Z n )). Thus, to each s £ S there corresponds an n- 
block code N , which we denote more compactly by C~ = 
(/s;<fe)- Our (n,n)-block code C n,n thus has the encoder 
f(X n ,Z n ) = j(Z n )fj (Zn) {X n ) and the corresponding de- 
coder <S>{J{Z n )f J{zn) {X n )) k 0- ( (/ 7(zn) (X»)). That is, 
the binary string emitted by the encoder consists of two parts: 
(a) the header containing a binary description of the chosen 
code [equivalently, of the estimated source P§r Z n)]< an d (b) 
the body containing the binary description of the data X n 
using the chosen code at rate R. The combined rate of C n ' n 
is i? + n _1 log |«S| = i? + 0(logn/n) bits per letter, while the 
expected distortion with respect to Pg is 
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This scheme is universal because the map / and the sub- 
codes Cf are chosen so that Eg [D e (Cj )] - D$(R) = 

0(i/logn/n) for each 9 € 0. Note that decoder can not only 
decode the data in a near-optimal fashion, but also identify the 
active source up to a variational ball of radius 0(y/logn/n). 

We remark that our scheme is a modification of the two- 
stage code of Chou et al. [4], the difference being that 
here the subcode C~, used to encode the current n -block 
X n , is selected on the basis of the preceding n-block Z n . 
Nonetheless, we shall adopt the terminology of [4] and refer 
to / as the first-stage encoder. The structure of the encoder 
and the decoder in our scheme is displayed in Fig. ^ 
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Fig. 1 . The two-stage scheme for joint universal lossy source coding and identification. 



IV. The main theorem 

Before stating and proving our result, let us list our assump- 
tions, as well as fix some auxiliary results and notation. 
The source models. Let {X i }'£L_ 00 be an i.i.d. source with 
alphabet X and with the marginal distribution of X\ belonging 
to an indexed class {Pg : 9 £ 0}, such that the following 
conditions are satisfied: 

(5.1) A" is a measurable subset of TR d . 

(5.2) Each Pg is absolutely continuous with pdf pg. 

Distortion function. The distortion function p is assumed to 
satisfy the following requirements: 

(D.l) inf p(x, x)=0 for all x £ X. 

xEX 

(D.2) sup p(x, x) — K < oo. 
xex,xex 

Under these conditions, it can be proved (in a manner similar 
to the proof of Thm. 2 of Linder et al. [3]) that 

D$(R) = Dg{R) + 0(Vlogn/n) (2) 

for each 9 £ and for all rates R such that Dg (R) > (this 
condition is automatically satisfied in our case since all the 
Pg's are absolutely continuous). The constant implicit in O(-) 
depends on both 9 and R. 

Vapnik-Chervonenkis theory. Given a collection A of mea- 
surable subsets of H d , its Vapnik-Chervonenkis (VC) dimen- 
sion V(A) is defined as the largest integer n for which 

max \{(l {xieA },--- ,!{ Xn eA}) ■ A £ A}\ = 2"; (3) 

x n e{M d ) n 

if Q holds for all n, then V(A) = oo. If V(A) < oo, we 
say that A is a VC class. For any such class, one can give 
finite-sample bounds on uniform deviations of probabilities of 
events in that class from their relative frequencies. That is, if 
X n — {X\, • • • , X n ) is an i.i.d. sample from a distribution P, 
and if A is a VC class with V(A) > 2, then 

Pr/sup \P X »(A)-P(A)\ >e\ < 8n v(A) /32 .Ve > 

IAEA J 



and 

e( sup |P X » (A) -P(A)\\ < c^logn/n, 
Ue-4 J 

where Pjf" i s the empirical distribution of X n and the constant 
c depends on V(A) but not on P. 2 (See, e.g., [6] for details.) 

Theorem 1. Let {X i ]°^_ rxi be an i.i.d. source satisfying Con- 
ditions (S.l) and (S.2), and let p be a distortion function 
satisfying Conditions (D.l) and (D.2). Assume the following: 

1) is a bounded subset of IR for some k. 

2) The map 9 i— > Pg is uniformly locally Lipschitz: there 
exist constants r, (3 > such that, for each 9 £ Q, 
d v (Pg,P n ) < [3\\9 ~ n\\ for all 77 £ B r {9), where || • || 
is the Euclidean norm on JR k and B r {9) is an open ball 
of radius r centered at 9. 

3) The collection Aq of all sets of the form Ag :V = {x £ 
X : pg(x) > p v (x)} with 9 ^ r/ (the so-called Yatracos 
class associated with {Pg} [8], [9], [10]) is a VC class. 

Also, suppose that for each n, 9 there exists an n-block code 
Cg = {fet'Pe) at rate of R bits per letter achieving the nth- 
order operational DRF for 9: Dg{C%) = D^(R). Then there 
exists an (n, n)-block code C n ' n with 

R(C n ' n ) = R + 0(logn/n), (4) 

such that for every 9 £ 

Sg(C n > n ) = O(^i^). (5) 

Therefore, the sequence of codes {C n,n }^ =1 is weakly min- 
imax universal for {Pg : 9 £ 0} at rate R. Furthermore, 
for each n the first-stage encoder / and the corresponding 
parameter decoder ip are such that 

d v {P e ,P mxn)) ) = OCVlogn/n) P e -a.s. (6) 

The constants implicit in the O(-) notation in @ and (|6j are 
independent of 9. 

2 Using more refined techniques, the c^/logn/n bound can be improved 
to c'/y'n, where c' is another constant, but c' is much larger than c, so any 
benefit of the new bound shows only for "unpractically" large values of n. 



Proof: The proof is by construction of a two-stage (n, n)- 
block code as outlined in Sec. [HI] As before, let X n denote the 
current n-block X n {t), and let Z n be the preceding n-block 
X n (t— 1). We first define our first-stage encoder /, which we 
shall realize as the composition go 9 of a parameter estimator 9 
and a lossy parameter encoder g (cf. Fig. [3- For any z n G X n 
and for any 9 G G, let A e (z n ) = sup \P e (A) - P z n(A)\, 

AeA e 

where Pg(A) = J A pg(x)dx. Define 9(z n ) as any 9* G G 
such that Ag-(z n ) < inf Ag(z n ) + 1/n, where the extra 

1/n ensures that at least one such 9* exists. The map z n i— > 
9{z n ) is the so-called minimum-distance density estimator of 
Devroye and Lugosi [9], [10], which satisfies 

d y (P 9 , i^ (ZB) ) < 2A e (Z") + 3/2n. (7) 

Since _4e is a VC class, 

E e [d y (P e ,Pg (Z7l) )] < Cv /logn/n + 3/2n, (8) 

for each e 9, where c > depends on V^-Ae). 

Next, we construct the lossy encoder g. Since G is bounded, 
it is contained in some cube M of side J € IN. Let 
{M 1 (n) , M 2 (n) , • • • , M^ l) } be a partitioning of M into contigu- 
ous cubes of side 1/fn 1 / 2 ], so that K < (Jn 1 / 2 ) fc . Represent 
each Mj™' that intersects 9 by a unique fixed-length binary 
string Sj, and let S — {Sj}. Then if a given 9 G G is contained 
in Mj, map it to Sj, <?(#) = s}; this can be described by 
a string of no more than ^(logn 1 / 2 + log J) bits. For each 

(n) 'a (ri) 

Mj that intersects 6, choose a reproduction 9j G Mj D G 
and designate C~ as the corresponding n-block code C~. . The 

parameter decoder ifi : S — ► G is then given by ^(Sj) = 0j. 

The rate of the resulting (n, n)-block code C n,n does not 
exceed i?+n _1 fc(logn 1 / 2 +log J) bits per letter, which proves 
(@}. By {l}, the average distortion of C n,n on the source Pg 
is given by D g (C n ' n ) = Eg [D e (Cfj\, where 9 = 9{Z n ) = 
ip(f(Z n )). From standard quantizer mismatch arguments and 
the triangle inequality, 

a?(C£) < 4^[d y (p e , + d v (p 5 , p d )\. 

Taking expectations, we get 

§ n( C n, n) < 4K{Eg[d V (Pg,P d )]+Eg[d V (P~ g ,P d )}}. (9) 

We now estimate separately each term in the curly brackets in 
(|9j- Using (|8}, we can bound the first term by 

Eg[d v (Pg,Pg)} < cVlogn/n + 3/2n. (10) 

The second term involves 9 = 9(Z n ) and its quantized version 
9, where \\9 — 9\\ < \fkfn by construction of g, ip. Using 
the assumption that the map 9 i— > Pg is uniformly locally 
Lipschitz, as well as the fact that dy (P, Q) < 1 for any two 
distributions P, Q, it is not hard to show that there exists a 
constant j3' such that 

d V (Pg,Pg) </3'Vk/^ (ID 



and consequently that 

Eg[d v (P d ,P d )] <&y/k7n. (12) 
Substituting the bounds (TL0l and (Tl2l into (|9} yields 

^(C" ,n ) < K (4c Vlog n/n + 6/n + 4/3' , 

whence it follows that the nth-order redundancy 5g(C n ' n ) = 
0{^J\ogn/n) for every # G G. Then the decomposition 

5 e (C n > n ) = 6%(C n ' n ) + D%(R) - D S (R) 

and (|2jl imply that (|5jl holds for every 9 G G. 

To prove (|6j, fix an e > and note that by 0, ( II Q and the 
triangle inequality, dv(Pg,Pg) > e implies that 2Ae(Z") + 
3/2n + (3'y/k/n > e. Hence, 

Pr{dy(P e ,P ? ) >e} <Pr{A e (Z") > (e - 7v /L>)/2} , 
where 7 = 3/2 + (3'Vk. Since Aq is a VC class, 

Pr{d V {Pg,P d ) > e} < 8n ^(-4e) e -n(^ 7% A7^) 2 /128^ (B) 

If for each n we choose e„ > -\/l28F(yle) hm/n + jy/l/n, 
then the right-hand side of Jl 31 will be summable in n, hence 
dy(P 9 , P§r z ^)) = 0(yJ\ogn/n) Pg-a.s. by the Borel-Cantelli 
lemma. ■ 
The above proof combines techniques of Rissanen [1] (namely, 
explicit identification of the source parameters) with the 
parameter-space quantization idea of Chou et al. [4]. The VC 
condition on the Yatracos class Aq is needed to control the L\ 
convergence rate of the density estimators, which bounds the 
convergence rate of the distortion redundancies. We remark 
also that the boundedness condition on the distortion function 
can be relaxed in favor of a uniform moment condition with 
respect to a reference letter, but at the expense of a quadratic 
slowdown of the rate at which the distortion redundancy 
converges to zero (the proof is omitted for lack of space): 

Theorem 2. Let {Pg : 9 G G} be a family of i.i.d. sources 
satisfying the conditions of Theorem^ and let p be a distortion 
function for which there exists a reference letter a* G X such 

that sup Eg[p(X, a*) 2 ] < 00, and which satisfies Condition 

eee 

(D.l). Then for any rate R > satisfying supP^(P) < 00 

there exists a sequence {C n,n }^ D =1 of (n,n)-block codes with 
R{C n < n ) = R + 0{\ogn/n) and 5 6 {C n > n ) = 0(</logn/n) 
for every 9 G G. The source identification performance is the 
same as in Theorem 

V. Examples 

Here, we present two explicit examples of parametric fami- 
lies satisfying the conditions of Theorem^and thus admitting 
joint universal lossy coding and identification schemes. 
Mixture classes. Let pi , ■ ■ ■ , p^ be fixed pdf 's over a measur- 
able domain X C H d , and let G be the simplex of probability 
distributions on {1, ••• , k}. Then the mixture class defined 
by the p^s consists of all densities of the form pg(x) = 
9ipi(x), 9 = (#!,••• ,9k) G G. The parameter space 



is compact and thus satisfies Condition 1) of Theorem^ In 
order to show that Condition 2) holds, fix any 9, rj G 0. Then 

k i — 

dv(p s ,p,)<\j2\e l - Vt \<^\\e-r ] i 

i=i 

where the last inequality follows by concavity of the square 
root. Therefore, the map 9 i— > Pg is everywhere Lipschitz with 
Lipschitz constant y/k/2. It remains to show that the Yatracos 
class Aq is VC. To this end, observe that x € Ag :Tj if and 
only if ^2n(0i — f)i)Pi{.%) > 0. Thus Ae consists of sets of the 
form | a; € X ; J2i a iPi(x) > 0, (ai, • • • , € jR fc j. Since 
the functions • • • , Pfe span a linear space of dimension not 
larger than k, Lemma 4.2 in [6] guarantees that V(Aq) < fc. 
Exponential families. Let A" be a measurable subset of TR d , 
and let be a compact subset of IR fc . A family {pg : 9 e 
0} of probability densities on X is an exponential family 
[11] if there exist a probability density p and k real-valued 
functions hi , ■ ■ ■ ,hk on X, such that each pg has the form 
pg(x) = p(x)e e - h ^-a( ), where h(x) = (h x (x), ■■■ , h k (x)), 

9 ■ h(x) = Y,i=i0iHx)> and 9(0) = m f x e - h ^p{x)dx 
is the normalization constant. Given the densities p and pg, 
let P and denote the corresponding distributions. By the 
compactness of 0, Condition 1) of Theorem [2 is satisfied. 
Next we demonstrate that Conditions 2) and 3) can also be 
met under certain regularity assumptions. 

Namely, suppose that {1, hi, • • • , hk] is a linearly indepen- 
dent set of functions. This guarantees that the map 9 i— > Pg is 
one-to-one. We also assume that each hi is square-integrable 
with respect to P: J h\dP < oo, 1 < i < k. Then 
the (fc + 1) -dimensional real linear space T C L 2 (X,P) 
spanned by {l,hi,- ■ ■ ,hk} can be equipped with an inner 
product (/, g) = J fgdP and the corresponding L 2 norm 

ll/lla = y/UJ) = y[T x PdP. Also let H/IU ^ inf {M : 
|/(x)| < M P-a.e.} denote the norm of /. Since T 
is finite-dimensional, there exists a constant Ak > such 
that ||/||oo < ^4fc||/|!2- Finally, assume that the logarithms 
of Radon-Nikodym derivatives dP/dPg = p/pe are uni- 
formly bounded P-a.e.: sup || hx(p/pe)^oo — L < oo. These 

conditions are satisfied, for example, by truncated Gaussian 
densities over a compact domain in IR^ with suitably bounded 
means and covariance matrices. 

Let D(Pg\\P v ) denote the relative entropy (information 
divergence) between Pg and P v . With the above conditions 
in place, we can prove the following result along the lines of 
Lemma 4 of Barron and Sheu [11]: 

D(P e \\P v ) < -e^ n ^' p ^e 2A ^ 9 -^\\6 - r]f , (14) 

where || • || is the Euclidean norm on H fe . From Pinsker's 
inequality d v (Pg,P v ) < y/D{Pe\\Pr,)/2 [12, Lemma 5.2.8], 
(1 1 41 and the uniform boundedness of lnp/pg, we get 

dv(Pe,P v )<l3oe A ^ e -^\\9- v \\, 9, v e 0, (15) 

where [3q = e L l 2 /2. If we fix 9 S 0, then from (|T3) it follows 
that for any r > 0, d v (P e ,P v ) < P e Akr \\9 - rj\\ for all r] 



satisfying \\r) — 9\\ < r. That is, the family {Pg : 9 £ 0} 
satisfies the uniform local Lipschitz condition of Theorem [2 
and the magnitude of the Lipschitz constant can be controlled 
by tuning r. 

All we have left to show is that the Yatracos class _4e is a 
VC class. Since pg(x) > p rj (x) if and only if (9 — rf) ■ h(x) > 
g(9) — g(rf), Aq consists of sets of the form 

jx e X : a + &ihi{x) > 0, (a , , at) € H fc+1 |. 

i 

Since the functions 1, hi, ■ ■ ■ ,hk span a (k + l)-dimensional 
linear space, the same argument as that used for mixture 
classes shows that V(A&) < k + 1. 

VI. Conclusion and future work 
We have presented a constructive proof of the existence of 
a scheme for joint universal lossy block coding and identifi- 
cation of real i.i.d. vector sources with parametric marginal 
distributions satisfying certain regularity conditions. Our main 
motivation was to show that the connection between universal 
coding and source identification, exhibited by Rissanen for 
lossless coding of discrete-alphabet sources [1], carries over 
to the domain of lossy codes and continuous alphabets. As far 
as future work is concerned, it would be of both theoretical 
and practical interest to extend the approach described here to 
variable-rate codes (thus lifting the boundedness requirement 
for the parameter space) and to general (not necessarily 
memoryless) stationary sources. 
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